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GENOMIC MISMATCH SCANNING 



CROSS-REFERENCE TO GOVERNMENT GRANT 
This invention was made with Government support under 
contract HG00450 awarded by the National Institutes of 
5 Health. The government may have certain rights in this 
invention. 

CROSS - REFERENCE TO RELATED APPLICATIONS 
This application is a continuation-in-part of 
application serial no. 880,167 filed on May G, 1992. 

10 INTRODUCTION 
Technical Field 

The field of this invention is genetic mapping. 

Background 

Linkage mapping of genes involved in disease 
15 susceptibility and other traits in humans, animals and 
plants has in recent years become one of the most 
important engines of progress in biology and medicine. 
The development of polymorphic DNA markers as landmarks 
for linkage mapping has been a major factor in this 
2 0 advance. However, current methods that rely on these 
markers for linkage mapping in humans are laborious, 
allowing screening of only at most a few markers at a 
time. Furthermore, their power is limited by the sparsity 
of highly- informative markers in many parts of the human 
2 5 genome . 
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To map genes whose manifestations are recognized only 
in. the whole organism,, the standard approach relies on 
identifying linkage between the trait and a genetic marker 
whose map position is already known. The most abundant 
5 and generally useful class of markers . in . the human genome 
are DNA sequence polymorphisms-either restriction fragment 
length polymorphisms (RFLP's) or other DNA polymorphisms 
that can be detected by hybridization with specific probes 
or by a;mplif ication using specific primers . 

10 RFLP mapping and related methods have had dramatic 

successes. However, their utility is limited by two 
problems: First, many discrete loci need- to be examined, 
but only one or a few loci can be typed at a time, which 
procedure is arduous; second, the low density on the map 

15 and low polymorphism information content (PIC value) of 
available markers means that multiple -members of each 
. family need to be typed to obtain useful linkage 
information, even when only two share the trait of 
interest. While these disadvantages can be overcome to 

20 some degree by automation and technical improvements, as 
well as by developing closely- spaced extremely polymorphic 
markers T the use of discfete markers for specific map 
intervals has inherent limitations. The limitations are 
particularly marked when applied to mapping strategies 

25 that seek to reduce the number of individuals that need to 
be analyzed in each family . 

It is generally much, easier to collect many pairs of 
related individuals who share a trait of interest than to 
collect a few large, well -documented pedigrees in which 

3 0 the trait is segregating. In the former case, the 

absolute number of individuals is smaller. For medically- 
significant traits, affected individuals are likely to 
present themselves for examination, whereas other family 
members need to be traced and recruited. Furthermore, 

35 individuals vital to pedigree analysis may be deceased. 
Yet the low density and low information content of 
- available, markers makes the use of pedigrees almost 
mandatory for linkage mapping by RFLP analysis. 
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Collection of appropriate families frequently poses the 
principal barrier to mapping genes that influence human 
traits, particularly genetically complex traits ; 
Strategies have been reported for linkage mapping using 
5 the information in DNA from multiple very small sets of 
affected relatives (typically pairs or even single 
individuals) . However, these strategies depend upon the 
availability of closely- spaced highly information genetic 
markers throughout the genetic map. 
10 The issues raised above in reference to linkage 

mapping also apply to genetic risk assessment in medicine. 

In principle, any base that differs among allelic 
sequences could serve as a marker for linkage analysis. 
Single-base differences between allelic single copy 
15 sequences from two different haploid genomes have been 
estimated to occur about once per 3 00 bp in an outbred 
-Western European population. This calculates to a total 
of about 10 7 potential markers for linkage analysis per 
haploid genome. Only a tiny fraction of these nucleotide 
20 differences contribute to mapping using current methods. 
There is, therefore, substantial interest in developing 
new methods that utilize the available genomic information 
more efficiently and can provide information concerning 
multi-gene traits. Such methods could be valuable, not 
25 only for gene mapping, but also for genetic diagnosis and 
risk assessment. 

Relevant Literature 

Articles describing the use of RFLP's are described 

in Botstein, et al., (1980) Am. J. Hum. Genet. 32:314-331; 
30 Donis-Keller, et al . (1988) Cell 51:319-337; Kidd, et al . 

(1989) Cvtoaenet. Cell , Genet. 51:622-947 and Risch (1990) 

Am. J . Hum. Genet. 46:242-253. Mapping strategies may be 

found in Risch (1990) Am. J. Hum. Genet. 46:229-241; 

Lander and Botstein (1987) Science 236:1567-1570; and 
35 Bishop and Williamson (1990) Am. J. Hum. Genet. 

46:254-265. Sandra and Ford, (1986) Nucleic Acids Reg. 
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14:7265-7282 and Casna, et al . (1986) Nucleic Acids Res. 
14:7285-7303 describe genomic analysis. 

SUMMARY OF THE INVENTION 
Genomic analysis is achieved through the process of: 
5 Digesting DNA to be compared from two different sources, 
usually individuals who are genetically related or 
suspected of being genetically related, with a restriction 
enzyme that cuts relatively infrequently; combining single 
strands of the genomic fragments from the two individuals - 

10 under conditions whereby heterohybrids (hybrids containing 
one strand from each individual) can be distinguished from, 
homohybrids (hybrids containing both strands from the same 
individual); separating homohybrids from heterohybrids; 
separating mismatch-free heterohybrids from hybrids with 

15 mismatches; preparing labeled probes from the mismatch- 
. free heterohybridsf and identifying regions of genetic 
identity between the two individuals by means of said 
labeled probes. The mismatch- free heterohybrids provide 
highly specific probes for regions of genetic identity by 

20 descent, since sufficiently large hybrid DNA molecules 

formed from non- identical regions are expected to have at. 
least, one and usually many base mismatches. 

Alternatively, one may map regions of heterozygosity, 
and, by inference, homozygosity in a single individual by 

25 isolating DNA fragments substantially free of multicopy 

DNA; melting 1 the DNA and reannealing to provide for hybrid 
fragments,- introducing nicks specifically, in mismatched 
DNA; labeling the nicked DNA; and using the labeled DNA as 
probes to identify regions of heterozygosity. Regions of 

30 homozygosity or hemizygosity (where all or a significant 
portion of a chromosome is missing, e.g. aneuploidy) are 
inferred by the absence of hybridized label. 

DESCRIPTION OF THE SPECIFIC KMKOnTMttftTTg 
Methods and compositions are provided for genomic 
35 mapping by identifying regions of the genome at which DNA 
sequences from two DNA sources are perfectly identical 
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over long stretches (typically 10 3 to 2 x 10 4 nt) . 
Depending upon the nature of the probe, the procedures may 
vary. In a procedure that allows for labeling of regions 
of genetic identity between two individuals, DNA sources 
5 are digested with a restriction enzyme that cuts 

infrequently; the resulting DNA is processed to isolate 
mismatch- free heterohybrid dsDNA fragments free from 
mismatch containing or homohybrid fragments; the mismatch- 
free heterohybrid fragments are labeled and then used to 
10 identify regions of genetic identity between the two 
sources . 

Alternatively, to map regions of homozygosity or 
heterozygosity in an individual genome, one may digest the 
DNA from that individual; optionally, remove multi-copy 

15 DNA; melt and reanneal the DNA; introduce nicks into 

mismatched dsDNA; label the mismatched dsDNA ; and use the 
- labeled DNA as probes for identifying genomic regions that 
are heterozygous, leaving regions that are homozygous or 
hemizygous having substantially lower labeling. 

20 The DNA source may be any source, haploid to 

polyploid genomes, normally eukaryotic, and may include 
vertebrates and invertebrates, including plants and 
animals, particularly mammals, e.g. humans. The DNA will 
be of high complexity where each of the sources will 

25 usually have greater than 5 x 10 4 bp, usually greater than 
10 6 bp, more usually greater than about 10 7 bp. Thus, in 
any situation where one wishes to compare two sources of 
DNA as to their genetic similarities, whether the sources 
are related or not, the subject method may be employed. 

3 0 Usually, the sources will be related, being of the same 
species, and may be more closely related in having a 
common ancestor not further away than six, frequently 
four, generations. 

For linkage mapping or genetic diagnosis, genetically 

35 related individuals are required. Thus, the subject 

method may find application in following segregation of 
traits associated with breeding of plants and animals, the 
association of particular regions in the genomic map with 
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particular traits, especially traits associated with 
multiple genes , the .transmission of traits from ancestors 
or parents to progeny, the interaction of genes from 
different loci as related to a particular trait, and the 
5 like. While only two sources may be involved in the 

comparison* a much larger sampling may be involved, such 
as 20 or more sources., where pairwise comparisons may be 
made between the various sources . Relationships between 
the various sources, may vary widely, e.g. grandparents and 
10 grandchildren; siblings; cousins; and the like. 

Depending upon whether the regions , of genetic 
identity or regions of non- identity are to be labeled, the 
treatment of the DNA from the source will vary.. The DNA 
may be processed initially in accordance with conventional 
15 ways, lysing cells, removing cellular debris, separating 
the DNA from proteins, lipids or other components present 
- in the mixture and then using the isolated DNA for 
restriction enzyme digestion. See Molecular Cloning, A 
Laboratory Manual t 2nd ed. (eds. Sambrook et al.) CSH 
20 Laboratory Press, Cold Spring Harbor, NY 1989. Usually, 
at least about 0.5 /ig of DNA will be employed, more 
usually at least about 5 fig of DNA, while less than 50 fig 
of DNA will usually be sufficient. 

The following procedure will address solely the 
25 methodology employed for isolating and labeling of DNA 
corresponding to regions of identity-by-descent between 
two related individuals. The total DNA from both cellular 
sources is digested completely with a restriction enzyme 
that cuts relatively infrequently, generally providing 
30 fragments of strands of about 0.2-10xl0 4 nt, preferably of 
about 0.5-2xl0 4 nt. The size is selected to substantially 
ensure the presence of at least one GATC sequence, and at. 
least one base difference between any allelic fragments 
not identical by descent i.e. to ensure that homohybrid 
3 5 fragments, or heterohybr id fragments that are not 
identical by descent, sustain at least one cut in a 
subsequent step.. This enzyme will normally recognize at 
least a 6 -nucleotide consensus sequence and may involve 
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either blunt-ended or staggered- ended cuts. For the 
method described in detail here, an enzyme that cuts to 
leave a protruding 3' end is needed. The protruding 3' 
ends are preferred specifically when exonuclease III 
5 digestion, followed by BNDC binding, is ultimately used to 
eliminate homohybrids and mismatched DNA' s . Restriction 
enzymes yielding other sorts of ends may be preferred if 
other specific steps are substituted, as described further 
below. 

10 The resulting DNA fragments are then processed to 

provide for a means of separating complementary DNA 
hybrids, where the two strands are from different sources, 
from complementary DNA hybrids where the two strands are 
from the same source. .This may be achieved in different 

15 ways. The method exemplified in the subject invention 

uses the following steps . DNA from one of the sources is 
methylated with a sequence specific methylase, such as Dam 
. methylase or a restriction methylase, so as to 
substantially completely methylate the consensus sequences 

20 of the DNA from one of the sources. The other source is 
left unmodified or methylated completely with a different 
restriction methylase. 

The two DNA samples are then mixed, denatured and 
allowed to re anneal . A practical rate of complete 

25 annealing of the complex DNA samples can be achieved by 
using chemical or protein catalysts under conditions that 
preserve large DNA strands (Casna, et al. (1986) Nucleic 
Acids Res. 14:7285-7303; Barr and Emanuel (1990) Anal . 
Bioch. 186:369-373; Anasino (1986) ibid 152:304-307) . It 

3 0 is also necessary to avoid or minimize network formation 
resulting from rapid hybridization of non-allelic repeated 
sequences, so that simple fully-duplex products can be 
recovered even when dispersed repetitive sequences are 
embedded in them. Annealing conditions that have been 

35 shown to meet this requirement are FPERT conditions as 
described in Casna (1986) supra . 

The reannealed DNA mixture is digested with two 
methylation-sensitive restriction endonucleases . Hybrids 
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formed between the two different DNA specimens will be 
hemimethylated at the methylation sites. The restriction 
enzymes are selected so as to digest sites which are 
unmethylated or doubly methylated,, while being incapable 
5 of cleaving the hemimethylated sites . Desirably, the 

sequence will be a relatively common sequence, generally 
occurring on the average of about 1-lOxlO 2 , preferably 
about l-5xl0 2 . A. more stringent selection for mixed-donor 
hybrids can be achieved by using additional combinations 

10 of restriction/modification enzymes (Casna (1986) supra) . • 
Optionally,, one may remove unannealed DNA (single stranded 
DNA) at this time using any convenient method, e.g. 
adsorbtion to BNDC. " _ - " . - 

Various combinations of enzymes can be employed- The 

X5 combination should ensure that there is at least one cut 
in any homohybrid fragment , preferably at least two or 
.more, so the sequence should be relatively common. For 
example, coli dam methylase, which recognizes GATC for 
methylation, may be employed for methylation. This 

20 modification enzyme may then be used with the methylation- 
sensitive restriction endonucleases Dpnl and Mbo l. which 
cleave at GATC sites, the former at doubly-methylated 
sites and the latter at unmethylated sites. The 
particular sequence "GATC 11 is found every few hundred bp 

25 in the human genome . While a single 

restriction/modification site is preferable, two or three 
sites may be involved where different combinations of 
modification and restriction enzymes are employed. With 
DNA from sources other than human, other combinations of 

3 0 modification enzymes and restriction enzymes may be 
employed. 

An alternative procedure is as follows . By cutting 
the two DNA. samples using a different restriction enzyme 
for each— specif ically two enzymes that share a common 
35 recognition sequence but in one case cut with an N- base 3' 
overhang, and in the other with an N-base 5 r overhang-only 
the heterohybrids will have flush ends. An example of 
such a pair of restriction enzymes for the case N=4, is 
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Acc65I and Kpn l . The hybrids with both strands derived 
from the same DNA sample will retain either a 5' or 3' 
overhanging end. 

The flush-ended heterohybrids will uniquely be able 
5 to be ligated to a flush-ended partner, such as an 

oligonucleotide. This oligonucleotide might, for example, 
have a hairpin structure, such that the "capped" ends are 
protected from exonuclease digestion. Variations on this 
principle are possible, exploiting the distinctive 
10 structures of the ends of heterohybrids compared with 
homohybrids, when related restriction enzyme pairs are 
used. 

This strategy for selecting heterohybrids can replace 
the Mbo l/ Dpn l digestion step in selecting heterohybrids. 

15 However, a 5' to 3 ' exonuclease, such as bacteriophage 
lambda exonuclease, or a combination of a helicase and 
. exonuclease VII and/or I, or another combination of 
enzymes to allow digestion of all uncapped ends would need 
to be used in addition to, or in place of exonuclease III, 

20 following the MutHLS nicking step in the procedure 
outlined below* 

Other techniques may also be used for the initial 
separation of homohybrids from heterohybrids. By growing 
the cells from one of the sources with heavy isotope- 

25 labeled nucleosides, heavy atom labeled nucleosides, or 

other isotope labeled precursors, the two strands from the 
different sources will differ in density. Isotopes that 
may be used include 15 N, 13 C, 2 H, etc. The duplexes may 
then be separated by density banding. 

3 0 Alternatively, one may label the DNA from the two 

sources with different labels, e.g. using labeled 
nucleotides and terminal deoxynucleotidyl transferase, by 
random conjugation, and the like. One can then separate 
all of the duplexes as to one label, and then divide that 

35 group into homo- and heterolabeled duplexes. For example, 
biotin and avidin may be used to separate at one stage, 
where the avidin is bound to magnetic beads, and 
2 , 4-dinitrophenyl and anti- (2 , 4 -dinitrophenyl) may then be 
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(2, 4-dinitrophenyl) is bound to a support . 

Alternatively, one may select restriction enzymes 
which provide for overhangs, where heterohybrids will 
5 result in blunt ends or overhangs that differ from those 
of the homohybrids. One may then use the overhangs to 
separate homohybrids leaving the heterohybrids. . . 

Returning to the description of a principal 
embodiment, the resulting mixture of uncut heterohybrids, 
10 and Mbol or Dpnl cut homohybrids is then subjected to a 
system which allows for separation of DNA duplexes with 
some mismatched base pairs from complementary perfectly- 
matched DNA duplexes- See, for example, Lahue, et al. 
(1989) Science 245; 16 0-164; Su and Modrich (19.86) Proc. 
15 Natl. Acad. Sci. USA 83:5057-5061? Grilley, et al. (1989) 
J. Biol. Chem. 264:1000-1004; Su, et al. (1989) Genome 
-31:104-111; and Learn and Graf Strom (1989) J. Bacterid. 
171:6473-6481. 

Illustrative of such a system is the "methyl -directed 
20 mismatch repair" pathway of E. coll . The system uses the 
purified mutS, mutl. r mutH and uvrD gene products, as well 
as exonuclease I, exonuclease VII or Rec J exonuclease, 
single strand binding (SSB) protein and DNA 
polymerase III. In vitro, seven of the eight possible 
25 single base mismatches, as well as small insertions and 
deletions are efficiently recognized. When DNA synthesis 
is prevented, the system specifically introduces large 
gaps in the mismatch- containing molecules . The purified 
mutS, mjutL and mutH proteins act in concert to introduce 
30 nicks specifically into DNA molecules that contain base 

mismatches. With the exception of C-C mismatch, the other 
mismatches are effectively identified by one or more of 
the MutX enzymes (X indicates S, L or H) . 

Exonuclease III can initiate exonucledlytic digestion 
35 at a nick and digest the nicked strand in the 3 r to 5' 
direction to produce a gap. Exonuclease III can also 
initiate digestion at a recessed or flush 3' end, but it 
cannot initiate digestion at a protruding 3 ' end of a 
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linear duplex. Digestion from the ends of the linear DNA 
hybrids can, therefore, be prevented by choosing a 
restriction enzyme that produces protruding 3' ends for 
the initial digestion of the genomic DNA (Henikoiff (1984) 
5 Gene 28:351-360). The DNA ends produced by the 

restriction enzymes used to make the smaller fragments for 
distinguishing hemimethylated sites from unmethylated or 
dimethylated sites are selected to provide recessed (e.g. 
Mbol) or flush (Dpnl) 3' termini and these termini are 
10 susceptible to digestion by exonuclease III. Therefore, 
exonuclease III provides for partial single strands in 
hybrid molecules where both strands are derived from the 
same individual or where the strands contain base 
mismatches . 

15 In carrying out the process, the duplexes obtained 

from annealing and restriction digestion are exposed to 
- the methyl -directed mismatched repair system in vitro. 
The mutS. mut L and mutH introduce nicks specifically into 
the mismatched DNA molecules, while the exonuclease III 

20 can introduce gaps at nicked sites and at recessed 3' 

termini- The mismatch- free heterohybrids, corresponding 
to regions of identity-by-descent between the sources, can 
now be distinguished from all other duplexes by virtue of 
the absence of significant gaps or partial single strand 

25 regions. 

The partially single stranded and single stranded DNA 
is now separated from the fully-duplex DNA, This can be 
efficiently achieved using benzoylated naphthylated DEAE 
cellulose (BNDC) . At high salt concentration, BNDC 

3 0 retains single stranded and partially single -stranded DNA 
molecules with high efficiency and may be separated by 
centrif ugation or other separation means. The unbound DNA 
molecules are recovered, which in this case comprise 
complementary sequences from the two sources . 

3 5 A more complete methyl -directed mismatch repair 

enzyme system may ultimately prove to be superior in 
specificity to the simple system using only MutS, MutL and 
MutH. For example, MutL, MutS and MutH, plus helicase II 
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(UvrD protein) , exonuclease I, exonuclease VII, single- 
strand binding protein, and DNA polymerase" III,, acting in 
concert, can. carry out mismatch dependent DNA synthesis, 
and thereby specifically introduce labeled or modified 
5 nucleotides into mismatch- containing DNA molecules. For 
example, by using biotinylated nucleotides, mismatched DNA 
molecules can. be specifically biotinylated, and then 
immobilized avidin or related methods can be used to 
remove the mismatched molecules from a mixture of 

10 perfectly-matched and mismatched DNA molecules. 

Alternatively, use of the same set of enzymes, excluding 
DNA polymerase, would lead directly to gaps in the 
mismatched DNA eliminating the exonuclease III step. For 
such a procedure, the ends of the linear DNA molecules 

15 produced in the initial annealing step would need to be 
protected from the action of the exonucleases and 
. helicase. This procedure would thus interface well with 
the end-capping method suggested above as an alternative 
method for selection for heterohybrid molecules* 

20 It is fairly obvious that related enzyme systems 

based on mismatch repair protein from coli or other 
organisms could in principle substitute here for the 
particular enzyme system described* 

The mismatch-free heterohybrid dsDNA duplexes may be 

25 used without expansion. The subject method provides a 
sufficiently clean separation of the mismatch- free 
heterohybrid dsDNA duplexes in sufficient amount to allow 
for their labeling and direct use as probes. By using a 
readily available amount of DNA which can be efficiently 

3 0 handled, generally from about 0.5 to 100 /ig DNA, usually 
about 1 to 10 fig DNA, from each source, a satisfactory 
amount of the mismatch- free heterohybrid dsDNA duplexes 
are obtained for labeling, and probing a DNA sample. 

The mismatch-free heterohybrid dsDNA sequences from 

35 the two sources can be used for identifying the regions of 
identity-by-descent between the two sources. 
Conveniently, the dsDNA may be. labeled for use as a probe 
to identify the corresponding genomic regions. A wide 
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variety of labels may be employed, particularly radio- 
isotopes, fluorescers, enzymes, and the like. The 
particular choice of label will depend upon the desired 
sensitivity, the nature of the genomic sample being 
5 probed, the sensitivity of detection required, and the 
like. Thus, the more complex the genome under analysis, 
the higher the sensitivity which would be desired. 
Various instruments are available which allow for 
detection of radioactivity, fluorescence, and the like. 

10 The probes may be prepared by any convenient 

methodology, such as nick translation, random hexamer 
primed labeling, polymerase chain reaction using primers 
that prime outward from dispersed repetitive sequences or 
random sequences, and the like. 

15 The DNA that is probed may take a variety of forms, 

but essentially consists of a physically-ordered array of 
. DNA sequences that can be related back to the physical 
arrangement of the corresponding sequences in the genome 
(See Boyle, et al. (1990) Genomics 7:127-130; Penkel , 

20 et al. (1988) PNAS USA 87:6634-6638). A metaphase 

chromosome spread is one naturally- occurring example of 
such an array. Alternatively, and preferably, a partial 
or a complete collection of cloned, amplified, or 
synthetic DNA sequences corresponding to known genetic 

25 locations, immobilized in an ordered array on a solid 

substrate such as a membrane or a silicon or plastic chip, 
can be used as the target for probing. 

The hybridizations with the probes are performed 
under conditions that allow the use of complex mixtures of 

30 DNA probes and suppress artifactual hybridization to 
repeated sequences (Boyle, et al. (1990) Genomics 
7:127-133; Pinkel, et al. (1988) Proc . Natl. Acad. Sci . 
USA 85:9138-9142; Lichter, et al . (1990) ibid 
87:6634-6638). For example, in the case of grandparent- 

3 5 grandchild pairs, hybridization should occur in 

approximately 25 large patches (averaging about 4 \i in 
length when prometaphase chromosomes are used) , which in 
aggregate should cover about one-half of the genome. 
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The boundaries of the patches will be determined by 
the ends of chromosomes and sites of meibtic crossing 
over. The boundaries will reflect sites of meiotic" 
recombination that occurred in meioses intervening between 
5 the two relatives. Since the areas of hybridization will 
typically be in large contiguous patches, the method is 
very robust with respect to contamination of the probe 
with sequences representing regions that are not identical 
between the two subjects. Even if only a modest 

10 enrichment of identical-by-descent sequences is achieved, - 
the patches of identity and non^identity should be 
distinguishable as contiguous blocks of greater or lower 
signal intensity ^ 

Alternatively, rather than using the selected 

15 mismatch- free, heterohybrid restriction fragments as 
probes, they themselves can be immobilized on a solid 
- substrate, typically a Southern blot performed after 
resolving the DNA restriction fragments by gel 
electrophoresis. The immobilized fragments can then be 

20 probed by hybridization using labelled DNA probes specific 
to regions of interest. The presence or absence of 
hybridizing bands recovered from a specific pairwise 
comparison will indicate whether the pair has identity by 
descent at the locus in question. This procedure is 

25 likely to be useful for refining the resolution of a map, 
after initial mapping is achieved by using selected 
fragments as probes. 

Genetic regions identified by this mapping method may 
be cloned by standard methods or in some cases by direct 

3 0 cloning of the sequences selected by genomic mismatch 

scanning. These sequences can then be analyzed for their 
biological function, and in some cases used directly or in 
synthetic or modified form for diagnostic or therapeutic 
applications. 

3 5 When the region of genetic identity between two or 

more individuals is sufficiently small, for example, in 
plant breeding when products of serial backcrosses with 
selection for a useful trait are compared, it may be 
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useful to clone the selected identical-by-descent 
restriction fragments, since the resulting clone pool 
would be highly enriched for the desired gene sequence 
(responsible for the selected trait) . 
5 An alternate embodiment of genetic mapping looks to 

differences within an individual genome. One may look to 
identify regions of a genetic map where an individual or a 
sample is or is not heterozygous. This method is 
predicated on the isolation of single -copy sequences 
10 corresponding to regions of heterozygosity in the test 

individual, based on the ability of that individual's DNA 
to give rise to hybrid DNA molecules with base mismatches 
when the individual's DNA is denatured and reaimealed. 
The selected mismatched hybrid sequences are then labeled 

15 and used as probes for hybridization to a physically 

ordered array of a genomic DNA sample. Because regions of 
- the genome that lack heterozygosity are unable to produce 
single -copy DNA hybrids with base mismatches, these 
regions are visualized as gaps in the hybridization 

20 pattern. The following is an exemplary protocol. 

The DNA sample is digested to completion with a 
restriction enzyme that cuts the DNA frequently enough 
that most of the resulting fragments should not contain 
repetitive sequences. Illustrative restriction enzymes 

25 include enzymes having four nucleotide consensus 

sequences, such as Alul, Rsal, TaqI, Haelll and Mbol . The 
resulting fragments will for the most part be in the range 
of about 200-4000 nucleotides. After melting or 
denaturing the DNA, the DNA is allowed to reanneal in 

30 solution, where the rapidly reannealing multi-copy or 

repeated DNA sequences (low C Q t) are removed, for example, 
by hydroxyapatite chromatography, based on their rapid 
reannealing, followed by allowing the remaining DNA to 
anneal completely. Removal of the low C D t number DNA is 

35 desirable, but may not be essential. After complete 

reannealing, desirably removing residual unannealed DNA, 
the small, mostly single-copy DNA fragments that remain 
are incubated with the methyl -directed mismatch repair 
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proteins described , above; and ATP to . produce nicks in 
mismatched DNA duplexes. The nicks introduced 
specifically in mismatched molecules by the. methyl-' 
directed mismatch repair allow for nick-translation DNA 
5 synthesis by the DNA polymerase, and thus allow labeling 
with radiolabeled or other labeled nucleotide 
triphosphates. A polymerase that lacks 3 ' to 5' 
exonuclease activity, but retains 5' to 3' exonuclease 
activity, e.g. Taq polymerase, is preferred for this step 

10 to avoid labeling by replacement synthesis at the ends of* 
the fragments. 

The described procedure will provide probes- that, 
densely and specifically cover the regions of the genetic 
map that are heterozygous in the test individual.. 

15 Conversely, the regions that are not heterozygous will not 
provide labeled probes and so should be recognized as 
. distinctive gaps in the hybridization signal. In general, 
except where the coefficient of inbreeding is very low, 
the regions of homozygosity will be in contiguous patches 

20 of sufficient size, that even a few, e.g. 3-10-fold 

difference in hybridization intensity between homozygous 
and heterozygous sites should produce discernible 
boundaries. Thus, a& indicated previously, the separation 
of heterozygous from homozygous sequences need not be 

25 . absolute. In addition to clinical and mapping 

applications, this procedure is likely to be. useful in 
plant and animal breeding, since backcrossing and 
selection can be used to isolate a gene responsible for a 
trait of interest to a small region of heterozygosity or 

30 homozygosity. 

For convenience, kits may be supplied which provide 
the necessary reagents in a convenient form and together. 
For example, for the genomic mismatch screening, kits 
. could be provided which would include at least two of the 

35 following: The restriction enzymes: one or more which 

provide for average fragments from the target genome of a 
size in the range of about 0.5 - 10 x 10 4 ; one or more 
modification enzymes ; and restriction enzymes which 
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distinguish between hemimethylation and unmethylated or 
dimethylated consensus sequences; enzymes capable of 
introducing nicks at mismatches and expanding the nick to 
a gap of many (a 10) nucleotides; DNA polymerase; BNDC 
5 cellulose; and labeled triphosphates or labeled linkers 
for blunt end ligation or other composition for labeling 
the sequences to provide probes. Other components such as 
a physically ordered array of immobilized DNA genomic 
clones or metaphase chromosomes, automated systems for 
10 determining and interpreting the hybridization results, 

software for analyzing the data, or other aids may also be 
included depending upon the particular protocol which is 
to be employed. 

The subject methodology may find particular 
15 application in mapping genes by use of affected relative 
pairs --that is, pairs of relatives that have a genetically 
. influenced trait of interest. "Affected relative pair" 
methods are preferred, particularly when the penetrance of 
the allele that confers the trait is low or age -dependent, 
20 or when the trait is multigenic or quantitative, e.g\ 
height and build. Disease-susceptibility genes are 
particularly relevant . By determining where on the 
genetic map a small set, including two, of "affected" 
relatives have inherited identical sequences from a common 
25 source, and disregarding other family members, a highly 

efficient strategy for extracting linkage information from 
a pedigree is provided. The resulting identity-by- descent 
maps from multiple pairs of similarly-affected relatives 
can be combined and the composite map searched for loci 
3 0 where genotypic concordance between affected relatives 
occurs more frequently than would be expected by chance. 
With a sufficiently large number of affected relative 
pairs, such an analysis can reveal the positions of genes 
that contribute even a slight susceptibility to the trait. 
35 The procedure may also find wide application in routine 
screening for shared genetic risks in families. 

The following examples are offered by way of 
illustration and not by way of limitation. 
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EXPERIMENTAL 

Saccharomyces cereviseae, Baker's yeast, was used as 
a test, system, because this is genetically the best' 
characterised eukaryotic organism,, and it is easy to 
5 prepare and characterize yeast clones of: defined genetic 
relatedness. 

Test of the method: Two independently isolated 
haploid clones of Saccharomyces, Y55 . (HO his4 leu2 CAN1 S 
ura3 GAL2) and Y24 (ho HIS4 LEU2 MATa. canl R URA3 GAL2) , 

10 both derivatives of common lab strains , were used for the- 
experiment. We estimate from RFLP analysis that Y55 and 
Y24 (a derivative of S288C) differ at approximately one 
base pair per 100. The . two strains were mated, and the 
resulting diploid hybrid was sporulated, yielding 4 

15 haploid spore clones . For any given gentic locus , each 
spore clone ("daughter") received either the Y55 or the 
. Y24 allele. . The purpose of the test was to determine if 
we could specifically isolate, en masse, DNA from all the 
loci at which two individuals (here, pairs of parent and 

20 daughter clones) share genetic identity, excluding DNA 
from all regions where there was no identity by descent 
between the pair. We applied our genome mismatch scanning 
method to determine for . several loci , which spore clones 
had identity by descent with each parent. The results of 

25 the genome mismatch scanning analysis were compared with 
results from conventional analysis (Table 1) , using 
auxotrophic and drug resistance markers. Conventional 
analysis consisted of testing for growth on appropriate 
selective media. Four loci were tested— HIS4, CAN1, URA3 , 

SO GAIi2. HIS 4 is on chromosome 3, CAN! and URA3 on 

chromosome 5, and GAL2 is on chromosome 12. Our analysis 
of these loci included a total of 15 independent Pstl 
restriction fragments,, each of which constituted an 
independent test of the genomic, mismatch scanning method, 

35 as their sometimes adjacent location in the genome was 
immaterial to their behaviour in the selection. The 
result of the test was that all 15 Pst l restriction 
fragments analyzed were recovered if and only if they were 
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identical by descent between the parent and daughter being 
compared (Tables 1 and 2) . This result confirms the 
principles underlying genomic mismatch scanning. 
Procedure : 

5 1. DNA Isolation: High molecular weight DNA was 

isolated from each yeast strain (parent or spore clone) by 
a standard method ( Methods in Enzvmolocry 194 
(chapter 11) : 169-182) . 

2 . Initia l Restriction Enzvme Digestion: Each DNA 
10 sample was digested completely with Pat I restriction 
enzyme . The DNA was recovered by phenol : chloroform 
extraction, and ethanol precipitation, and resuspended in 
Tris-HCl 10 mM EDTA 1 mM, pH 8.0. 

3 • Methvla tion of DNA From Parental Strains with 
15 Dam Methvlase: DNA samples from the two parental strains, 
Y55 and Y24, were fully methylated with £L. coli Dam 
methylase (New England Biolabs) , at a DNA concentration of 
0.25 mg/ml, using 4 units of enzyme//xg of DNA in an 
overnight incubation at 37° , in the buffer recommended by 
20 the manufacturer. The samples were extracted with 

phenol: chloroform, ethanol precipitated, and resuspended 
in Tris-HCl 10 mM, EDTA 1 mM, pH 8.0. 

4 • Mixing and Solution Hybridization of Paired Test 
DNA Samples: 

25 A. Y55 DNA (5 /ig in 45 /il) + spore clone lb DNA 

(5 /zg in 80 /il) was denatured by adding 7.5 /il of 5M NaOH. 
After 10 min at room temperature, the sample was 
neutralized by adding 16 /xl of 3 M MOPS acid. 32 /zl of 
formamide and 200 /xl of 2X PERT buffer (4M NaSCN 20 mM 

3 0 Tris-HCl pH 7.9, 0 . 2 mM EDTA) were then added, and the 

sample was adjusted to 400 /il with water. 90% phenol in 
water was added until an emulsion was apparent (about 
80 til) , and then the sample was agitated to maintain the 
emulsion for 12 hours at room temperature {typically about 

35 23°) . 

B. Y55 DNA (5 ^g in 45 /il) + spore clone lc DNA 
(5 fig in 45 /il) was denatured by adding 5.4 /xl of 5M NaOH. 
After 10 min at room temperature, the sample was 
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neutralized by adding 12 fil- of. 3 M MOPS acid. 32 fil of 
formamide and 200 fil of 2X PERT buffer were then added,, 
and the sample was adjusted to 400 /il with water. 90% 
phenol in water was added until an emulsion was apparent 
5 (about 150 fil) , and then the sample was agitated to 
maintain the emulsion for .12 hours at room temperature 
(typically about 23°). 

C. Y24 DNA (5 fig in 45 fil) + spore clone la DNA 

(5 fig in 45 fil) was denatured by adding 5.4 fil of 5M NaOH. 

10 After 10 min at room temperature, the sample was 

neutralized by adding 12 fil of 3 M MOPS acid. 32 /il of 
formamide and 200 fil of 2X PERT buffer, were then added, 
and the sample was adjusted to 400 ^1 with water. 90% 
phenol in water was added until an emulsion was apparent 

15 (about 150 fil) , and then the sample was agitated to 

maintain the emulsion for 12 hours at room temperature 
- (typically about 23°). 

D. Y24 DNA (5 fig in 45 fil) + spore clone lc DNA 

(5 fig in 45 fil) was denatured by adding 5.4 fil of 5M NaOH. 
20 After 10 min at room temperature/ the sample was 

neutralized by adding 12 fil of 3 M MOPS acid. 32 fil of 
formamide and 200 fil of 2X PERT buffer were then added, 
and the sample was adjusted to 400 -^1 with water. 90% 
phenol in water was added until an emulsion was apparent 
25 (about 150 fil) r and then the sample was agitated to 

maintain the emulsion for 12 hours at room temperature 
(typically about 23°) . 

To recover DNA, the samples were each extracted once 
with chloroform, then ethanol precipitated, and 
30 resuspended in 200 .fil of Tris/EDTA. 

5 . Digestion of Homohybrid Molecules (Both Strands 
From the Same Source) with Dpnl+ and MboI+: 105 fil of 
each of the homohybrid strands was digested at 37° for 2 
hours in a final volume of 400 ^1 of NEB buffer 3 with 100 
35 units of Dpnl and and 25 units of Mbol. 

€. Removal of Residual Unannealed DNA: After 
Mbol/Dpnl digestion, samples were extracted with 
phenol/chloroform, 100 fil of 5M NaCl. was added to each and 



WO 93/22462 PCT/US93/04160 

-21- 

then samples were incubated* with 10 0 mg of BNDC cellulose 
(Sigma) equilibrated with 50 fiM tris, pH 8.0, 1M NaCl, at 
4° for 3 hours. The sample was centrifuged at 14 000 rpm 
in a microfuge, then the supernatant was extracted twice 
5 with phenol /chloroform and once with choroform, then 

ethanol precipitated, washed with 70% ethanol, dried and 
resuspended in 90 jil of Tris-HCl 10 mM EDTA 1 mM, pH 8.0. 

7 . Selective nicking of mismatched hybrid DNA / s ; 
15 jjlI of each DNA sample was mixed with 5.2 ng of MutH 
10 protein, 340 ng of MutL protein, 700 ng of MutS protein 
(all proteins provided in purified form by Paul Modrich, 
Duke University), in a final volume of 60 /zl of a buffer 
consisting of: 50 mM Hepes (pH 8.0), 20 mM KC1, 5 mM 
MgCl 2 , 1 mM DTT, 50 /xg/ml bovine serum albumin, and 2 mM 
15 ATP. The mixture was incubated at 37° for 3 0 minutes, and 
the reaction was then stopped by heating to 65° for 10 
. minutes . 

8 - Exonuclease III Digestion to Convert Nicks into 
Sinale-St rand Gaps, and Ends from Mbol or Dpnl Cleavage 

20 into Single-Str and Tails: The volume of the entire sample 
from step 7 was adjusted to 200 fil by adding 14 0 /il of a 
buffer consisting of; 50 mM Tris-HCl (pH 8.0), 5 mM 
MgCl 2 , 10 mM /3-mercaptoethanol . Then 10 units of 
exonuclease III were added and incubation continued for 10 

25 min at 3 7°. This reaction was stopped by adding EDTA to 
10 mM, followed by extraction with phenol /chloroform. 

9 • Removal of Partially or Fully Single-Stranded 
DNA Mol ecules from the Mixture: 50 fil of 5M NaCl + 250 /zl 
of 1M NaCl were added to adjust to a volume of 500 /il at a 

30 concentration of 1 M NaCl. 100 mg of BNDC cellulose 

equilibrated- with 50 mM tris, pH 8.0, 1M NaCl, was added 
and the mixture incubated at 4° for 3 hours.. (Sedert, 
et al. (1967) J. Mol . Biol. 26:537-540; Iyer and Rupp 
(1971) Bioch. Biophvs. Acta. 228:117-126) The mixture was 

3 5 centrifuged at 14 000 rpm for 1 min, then the supernatant 
was extracted once with phenol /chloroform, and ethanol 
precipitated overnight. The small pellets were 
resuspended in 15 /zl of Tris-HCl 10 mM, EDTA 1 mM, pH 8.0. 
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10 * Analysis of the Selected -DMA. -pool bv Southern 
Blotting-: One-sixth of each of the resulting DNA samples 
were electrophoresed through a 0.7% agarose gel, in TBE 
buffer for 15 hours at 70 volts. DNA was transferred to a 
5 nylon filter by Southern blotting, and, the filter was 
probed successively with labelled DNA from lambda phage 
clones corresponding to the 4 specific genetic loci, HIS4, 
CAN1, URA3 AND GAL2 . In each case-, 3-5 restriction 
fragments were readily detected, in the lanes corresponding 
10 to DNA samples that had identity by descent at the test 
loci, and not in the lanes .corresponding t6 samples that 
were known from direct tests not to match at the locus 
being probed (see Table 1 and; 2) . 



15 Table 1. 





Locus 


Strain 




CAN1 


URA3 


HIS4 


GAL2 


parents 


Y24 


A 


A 


A 


A 




Y55 


B 


B 


B 


B 














daughters 


la 


B . 


B 


A 


A 




lb 


A 


B 


A 


A 




1c 


A. 


A 


B 


B 



The two alleles at each locus are designated A and B, 
20 respectively for the alleles present in Y24 and Y55. Each 
spore clone inherits an allele from one of its two 
parents, either the A allele from Y24 or the B allele from 
Y55. The alleles at these loci can be distinguished 
directly by testing for growth in specific media. 
25 -■ - • 
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Table 2. Summary of the results of the Genome Mismatch 
Scanning test. 







LOCUS 


Comparison # 


Relative Pair 


CAN1 


URA3 


HIS4 


GAL2 


1 


Y24 /daughter la 






+ 


+ 


2 


Y5 5 /daughter lc 








+ 


3 


Y24 /daughter lc 


+ 


+ 






4 


Y5 5 /daughter lb 




+ 


















number of restriction fragments 


[4] 


[4] 


[3] 


[5] 



10 " -" indicates no DNA was recovered for any of the 

restriction fragment bands detected by the DNA probe 
specific for the indicated locus (neglecting faint bands 
from cross -hybridization to unlinked sequences) . 

M +" indicates recovery of DNA in all restriction bands 
15 detected by the DNA probe specific to the indicated locus. 

The number of restriction fragment bands surveyed by the 
probe used for the indicated locus is indicated in 
brackets in the bottom row of the table. The probes used 
in each case were bacteriophage lambda clones from the 

20 ordered collection established by Maynard Olsen. The 
specific clones used as probes were: CAN1 : clone 5917, 
HIS4: clone 4711, URA3 : clone 6150, GAL2 : clone 6637, The 
clone numbers are the numbers assigned by Maynard Olsen. 
For convenience, only 4 of the eight possible parent 

25 daughter combinations were tested. The results of the 
genetic tests for 4 loci are shown in Table 1. Numerous 
other pairwise comparisons have subsequently been tested 
with similar results . 



30 It is evident from the above results, that the 

subject methodology provides for numerous advantages. The 
methods provide access to a large set of highly 
polymorphic markers required for linkage mapping with 
small family units. A great increase in the effective 

35 number of informative markers is achieved without a 

corresponding increase in the number of individual tests, 
since all the markers are screened in parallel in a single 
procedure. By allowing much smaller sets of related 
individuals to be used for linkage mapping, the affected- 

4 0 relative-pair and homozygosity-by-descent mapping methods 
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can greatly reduce the cost* and labor involved in 
developing the human genetic map. Genomic mismatch 
scanning allows for the practical application of linkage 
mapping to genetically heterogeneous or quantitative 
5 traits, such as cardiovascular disease, asthma, 

psychiatric disorders .,. epilepsy , obesity, cancer, and 
diabetes. 

The subject methodology does not rely on any 
previously-mapped genetic markers. Thus, one can use the 

10 subject methodology to begin immediately to develop the 

genetic and physical maps of a genome for which little or 
no prior map information is available . This can find 
particularly important application in the breeding of 
plant or animal species, as well as in development of the 

15 genetics of such species . 

Each pair-wise analysis allows sites of meiotic 
. recombination to be mapped. In grandparent -grandchild 
pairs, identity-by-descent maps specifically identify the 
sites of meiotic recombination in the corresponding 

20 parent. Questions such as the relationship between the 
genetic and physical map, locations of sites of enhanced 
or diminished recombination, effects of age, sex and other 
factors on the frequency and distribution of meiotic 
recombination events, and the relationship between 

25 recombination and non-dys junction can be readily 
investigated in this way. 

Finally, the ability to detect directly regions of 
the genome that have lost heterozygosity may be useful in 
identifying putative tumor- suppressor genes and in the 

30 earlier diagnosis of malignancies, since loss of 

heterozygosity at specific loci appears to be an important 
genetic event in the development of many cancers. 

All publications and patent applications cited in 
this specification are herein incorporated by reference as 

35 if each individual publication or patent application were 
specifically -and individually indicated to be incorporated 
by reference . 
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Although the foregoing invention has been described 
in some detail by way of illustration and example for 
purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of 
5 the teachings of this invention that certain changes and 
modifications may be made thereto without departing from 
the spirit or scope of the appended claims. 
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WHAT IS CLAIMED IS: - 

■1. A method for separating DNA duplexes capable of 
being used for genetic mapping or identification, from a 
-complex mixture of DNA from. two related sources, wherein 
5 each of said sources contributes at least about 5 x 10 4 bp 
of DNA, said method comprising: 

digesting the. DNA from first and second related 
sources to provide restriction fragments, wherein said DNA 
from said first and second related sources is 
10 distinguishable as. a. result of differential modification 
of said DNA or use of different restriction enzymes in 
said digesting, to provide DNA duplexes consisting of 
homohybrids and heterohybrids; 

separating homohybr ids from heterohybrids ; 
15 introducing lesions in heterohybrids having 

- mismatches j and 

isolating heterohybrids having lesions from perfectly 
complementary heterohybrids . 

2 - A method according to Claim 1, wherein said 
20 different modification and separating is by means of: 

methylating the DNA from one of said related sources 
or methylating the DNA from both of saxd sources with 
different methylases; . 

cleaving said DNA duplexes with a methyl sensitive 
25 restriction enzyme resulting in cleaving of homohybrid 
DNA; and 

segregating heterohybrid .DNA from cleaved homohybrid 

DNA; 

and said introducing liesiohs is by means of: 
30 bringing together said heterohybrid DNA with enzymes 

of the methyl -directed mismatch repair pathway and an 
exonuclease, whereby nicks are introduced into said 
heterohybrid DNA comprising a lesion and said lesion is 
extended into a gap by said exonuclease to provide - 
35 partially single stranded and single stranded DNA; 
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dividing said partially single -stranded and single 
stranded DNA from said perfectly complementary DNA 
duplexes; and 

labeling said perfectly complementary DNA duplexes 
5 for use in genetic mapping or identification. 

3. A method according to Claim 2, wherein said 
enzymes comprise MutL, MutS, and MutH of E. coll. 

4. A method according to Claim 2, wherein included - 
in said combining step is helicase II, single-strand 

10 binding protein, and at least one of exonuclease -I and 
VII; or wherein said exonuclease is exonuclease III of 
E. coll. 

5 . A method according to Claim 2 , wherein said 
•dividing comprises: 
15 combining said partially single -stranded and single 

stranded DNA and DNA duplexes with benzoylated 
naphthylated DEAE cellulose (BNDC) at high salt 
concentration; and 

freeing the DNA bound to the BNDC cellulose from the 
20 perfectly complementary DNA duplexes. 

6 . A method for separating DNA duplexes capable of 
being used for genetic mapping or identification, from a 
complex mixture of DNA from two related genomes, wherein 
each of said genomes contributes at least about 10 6 bp of 

25 DNA, said method comprising: 

combining DNA restriction fragments from first and 
second related genomes, wherein said DNA substantially 
consists of restriction fragments comprising a GATC 
sequence, under melting and reannealing conditions to form 

30 homohybrid and heterohybrid DNA duplexes, wherein said DNA 
fragments from said first and second sources axe 
different ; 

segregating homohybrid from heterohybrid DNA duplexes 
by means of the difference in said DNA duplexes; 
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bringing together heterohybrid DNA duplexes with 
enzymes consisting of MutL, MutS, and MutH of E.. coll, 
resulting in lesions consisting of hicks in DNA duplexes 
that contain mismatches; . . 
5 treating the resulting mixture of nicked and unnicked 

DNA molecules with exonuclease III, such that nicked DNA 
molecules in the mixture are rendered partially single 
stranded; and 

separating said partially single stranded DNA from 
10 completely double stranded DNA* 

7 . A method according to Claim 6 , wherein said 
method comprises one of : 

including in said bringing together, or in a 
subsequent step, a DNA polymerase,, and labeled 
15 nucleotides,, wherein said labeled nucleotides become 
. incorporated into said nicked DNA duplexes ; and 

separating said labeled DNA from unlabeled DNA by 
means of said label; or 

combining said partially single -stranded and single 
20 stranded DNA and completely double stranded DNA with 
benzoylated naphthylated DEAE cellulose at high salt 
concentration; and 

separating the DNA bound to the cellulose from the 
completely double stranded DNA; or 
25 cleaving partially single stranded DNA at the site of 

said single strand to provide, small DNA duplexes; and 

separating said small DNA duplexes from uncleaved DNA 
duplexes. 

8 . A method according to Claim 6 , wherein said DNA 
30 restriction fragments are different in having different 

termini, wherein the difference in termini is used to 
separate heterohybrids from homohybrids. 
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9. A method according to Claim 6, wherein said 
bringing together further comprises: 

including with said heterohybrid DNA, helicase II, 
exonuclease I and/or VII, single strand binding protein, 
5 DNA polymerase III, and labeled nucleotides, wherein said 
labeled nucleotides become incorporated into partially 
single stranded DNA to provide labeled DNA; and 

separating said labeled DNA from unlabeled DNA by 
means of said label. 

10 10. A method for identifying nucleic acid areas of 

identity with a probe obtained by a separating method for 
separating DNA duplexes from a complex mixture of DNA from 
two related sources, wherein each of said sources 
contributes at least about 5 x 10 4 bp of DNA, said method 
15 comprising: 

digesting the DNA from first and second related 
sources to provide restriction fragments, wherein said DNA 
from said first and second related sources is 
distinguishable as a result of differential modification 
20 of said DNA or use of different restriction enzymes in 
said digesting, to provide DNA duplexes consisting of 
homohybrids and heterohybrids; 

separating homohybrids from heterohybrids; 
introducing lesions in heterohybrids having 
25 mismatches; and 

isolating heterohybrids having lesions from perfectly 
complementary heterohybrids; and 

labeling said perfectly complementary heterohybrids; 
said method comprising: 
30 combining said probe with an ordered array of DNA 

molecules representing at least a portion of a genetic map 
under conditions wherein said probe hybridizes to 
homologous DNA; and 

detecting the areas of identity by means of said 
35 label. 
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11. A method according, to Claim 10, wherein said 
ordered array is a metaphase chromosome spread. 

12 . A method according to Claim 10 , wherein said 
ordered array is an array of clones representing at least 

5 a portion of said genetic map, 

13. A method according to Claim 10., wherein said 
ordered array is an array of DNA sequences amplified 

±n vitro, representing at least a portion of said genetic - 
map. 

10 14. A method for identifying DNA sequences which are 

heterozygous for the same locus in the genome .of an 
individual, said method comprising: 

digesting genomic DNA with at least one restriction 
• enzyme to provide small DNA fragments of from about 200 to 
15 5000 bp; 

melting and reannealing said DNA fragments; 
combining said DNA fragments with enzymes of the 
methyl -directed mismatch repair pathway, whereby nicks are 
introduced into the mismatch- containing fragments of said 
20 DNA; 

labeling said nicked DNA fragments to provide probes; 

.combining said probes with an ordered array of DNA 
molecules representing at least a portion of said genome 
under conditions wherein said probe hybridizes to 
25 homologous DNA; and 

identifying said DNA sequences by means of said 
label . 

15. A method according to Claim 14, wherein said 
label comprises incubating said lesion containing small 
30 DNA fragments with a polymerase which lacks 3' to 5' 

exonuclease activity and labeled nucleotide triphosphates. 
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16. A method for identifying DNA sequences which are 
heterozygous for the same locus in the genome of an 
individual, said method comprising: 

digesting genomic DNA with at least one restriction 
5 enzyme to provide small DNA fragments of from about 200 to 
5000 bp; 

melting and reannealing said DNA fragments; 

combining said DNA fragments with enzymes of the 
nethyl- directed mismatch repair pathway, whereby nicks are 
10 introduced into the mismatch containing fragments of said - 
DNA; 

treating said DNA fragments with an exonuclease such 
that those fragments with nicks are rendered single- 
stranded or partially single- stranded; 
15 separating single-stranded and partially single- 

stranded fragments from completely double -stranded DNA 
.fragments by adsorbtion to BNDC; 

labeling said single stranded or partially single - 
stranded small DNA fragments or labeling perfectly 
20 complementary dsDNA to provide probes; 

combining said probes with an ordered array of DNA 
molecules representing at least a portion of said genome 
under conditions wherein said probe hybridizes to 
homologous DNA; and 
25 identifying said DNA sequences by means of said 

label . 

17. A method according to Claim 16, wherein said 
labeling of said lesion containing partially single 
stranded DNA comprises adding DNA polymerase III and 

30 nucleotide triphosphates including a label to said 
partially single stranded DNA. 

18. A method of genetic mapping comprising: 
combining fragmented first genomic DNA from two 

different sources under conditions wherein said fragments 
35 can anneal together to form heterohybrid dsDNA duplexes 
and homohybrid dsDNA duplexes, wherein the size of the 
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fragments is selected to substantially ensure that 
heterohybrids formed from genetic regions which are not 
identical by descent will contain at least one base 
mismatch and . to ensure the ability to separate perfectly 
5 matched duplexes from mismatched duplexes; 

separating heterohybrid duplexes from homohybrid 
duplexes and matched duplexes from mismatched duplexes by 
preferentially modifying heterohybrid duplexes and 
mismatched homohybrid duplexes ,- 

10 labeling at least one of. the strands of said matched ' 

heterohybrid duplexes to provide detectable probes without 
expansion of said matched heterohybrid duplexes; and 

combining second genomic DNA with said detectable 
probes to detect and/or map sites of matched DNA of said 

15 sources . 

.19. A method according to Claim 18, wherein said 
method further comprises: 

methylating the DNA from at least one of said sources 
at specific consensus sequences with a different enzyme 
20 for the DNA from each source; and 
said separating comprises : 

cleaving said duplexes with at least one restriction 
endonuclease , wherein duplexes which are unmethylated or 
doubly methylated are cleaved to provide smaller 
25 fragments; 

isolating the uncleaved duplexes free of said cleaved 
duplexes ; 

nicking mismatched duplexes and introducing gaps in 
said nicked mismatched duplexes while leaving matched 
3 0 . duplexes unchanged; and 

segregating gapped duplexes from matched duplexes. 

20. A method according to Claim 19, wherein said 
nicking and introducing gaps employs the proteins of a 
methyl, directed mismatch repair pathway.. 
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21. A method according to Claim 19, wherein said 
nicking and introducing gaps employs the proteins of the 
methyl -directed mismatch repair pathway and exonuclease 
III. 

5 22. A method according to Claim 19, wherein said 

segregating comprises combining said duplexes with BNDC 
cellulose and separating DNA bound to said BNDC cellulose 
from unbound duplexes. 
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23. A method according to Claim 18, wherein said 
second genomic DNA is an ordered array of DNA molecules 
representing at least a portion of a genetic map. 

24. A kit comprising MutL, MutS, MutH enzymes, 
labeled triphosphates or labeled linkers, and a DNA 

. polymerase . 

15 25. A kit according to Claim 24, further comprising 

at least one of the following: a modification enzyme, and 
an enzyme that cleaves at other than hemimethylated sites 
recognized by said modification enzyme; an exonuclease; 
helicase II; single-strand binding protein; BNDC 
cellulose; an ordered array of DNA molecules comprising at 
least a portion of a genetic map. 



20 



INTERNATIONAL SEARCH REPORT 



Ir national application No. 
K..T7US93/04I60 



A. CLASSIFICATION OF SUBJECT MATTER 

\PC(S) -C12Q 1/68: C12P 19/34 
US CL :435/6, 91: 935/77. 78 
According to International Patent Classification (IPC) or to both national classification and IPC 



B. FIELDS SEARCHED 



Minimum documentation searched (classification system followed by classification symbols) 
U.S. : 435/6. 91; 935/77, 78 



Documentation searched other than minimum documentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search iname of data base anu. where practicable, search terms used) 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Caiecory* 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No. 



Nucleic Acids Research, Vol. 14, Number 18, issued 1986, CASNA 
et al., "Genomic Analysis II: Isolation of High Molecular Weight 
Heteroduplex DNA Following Differential Methylase Protection and 
Formamide-PERT Hybridization", pages 7285-7303; see pages 
7289-7296 and pages 7286-7287. 

Nucleic Acids Research, Vol. 14, Number 18, issued 1986, 
SAND A, et al., "Genomic Analysis I: Inheritance Units and Genetic 
Selection in the Rapid Discovery of Locus Linked DNA Markers", 
pages 7265-7283, see pages 7266, 7268-7271 and 7281. 



1-5, 10-15, 18-23 



1-23 



X[ Further documents are listed in the continuation of Box C. Q See patent family annex. 



Spread categories of coed documents: 

document defining the general taut, of tbe art which is ool considered 
to be pan of particular relevance 

earlier documcat published oa or after the ixueraauonaj filing dale 



document which may throw doubts on priority cuumU) or which a 
cued to e stablish the publication date of another citation or other 
special reason 4 a* specified) 

document r efe rr in g to an oral disclosure, use. exhibition or other 

document published prior to tbe international filing date but later than 
the priority date churned 



later document published after the tmeraauooal filing dale or priority 
date and oot tn conflict with the application but cited to understand the 
principle or theory underlying the invention 

document of particular relevance: the claimed invention cannot be 
cotuudered novel or cannot be considered to involve an inventive ttep 
when the document » taken alone 



document of particular relevance: the churned invention < 

considered to involve an inventive step when the document is 
combined with one or more other such documents, such combination 
being obvious 10 a person skilled in the art 



of tbe ■ 



: patent family 



Date of the actual completion of the international search 
23 AUGUST 1993 



Date of mailing of the international search report 

SEP 07 ;an 



Name and mailing address of the ISA/US 
Commissioner of Patents and Trademarks 
Box PCT 

Washington. D.C. 20231 
Facsimile No. NOT APPLICABLE 



Authorized officer 

STEPHANIE W. 2Tf$>faER,Pl 
Telephone No. H03\ 308-01 96 




Form PCT/ISA/210 (second sheetjUuly 1992)* 



INTERNATIONAL SEARCH REPORT 



lp«emaLionai appiication No. 
- r/US93/04160 



C ( Continuation K DOCUMENTS CONSIDERED TO BE RELEVANT * 
Category* [ Citation of document, with indication, wnerc appropriate, oi the reievant passaces j Relevant to ctaim No. 

Y SCIENCE, Vol. 245, issued 14 JULY 1989, LAHUE et al;, 2-6. 14-22, 24 
"DNA Mismatch Correction in a Defined System", pages. 160-164, 

see entire document. 

Y FEBS LETTERS, Vol. 228, Number 2, issued FEBRUARY 1988, 6, 16, 17. 22, 
NORRIS et al. T "Sizing of Single-Stranded Regions in Double- 24. 25 
Stranded DNA by Preparative Benzoylated DEAE-Cellulose 
Chromatography", pages 223-227; see entire document. 

Y T MANIATIS et aL, MOLECULAR CLONING, A 6-9, 15-17 
LABORATORY MANUAL, published 1982 by Cold Spring 

Harbor Laboratory (New York), 545 pages; see pages 143 v 108, 
109. 



Form PCT/ISA/210 (continuation oi second sheet lfJuly 19921+ 



